Push compute engine value loading for longs down to tsdb codec. #132622

martijnvg · 2025-08-10T06:43:24Z

This is the first of many changes that pushes loading of field values to the es819 doc values codec in case of logsdb/tsdb and when the field supports it.

This change first targets reading field values in bulk mode at codec level when doc values type is numeric doc values or sorted doc values, there is only one value per document, and the field is dense (all documents have a value). Multivalued and sparse fields are more complex to support bulk reading for, but it is possible.

With this change, the following field types will support bulk read mode at codec level under the described conditions: long, date, geo_point, point and unsigned_long.

Other number types like integer, short, double, float, scaled_float will be supported in a followup, but would be similar to long based fields, but required an additional conversion step to either an int or float vector.

This change originates from #132460 (which adds bulk reading to @timestamp, _tsid and dimension fields) and is basically the timestamp support part of it. In another followup, support for single valued, dense sorted (set) doc values will be added for field like _tsid.

Relates to #128445

Given that the optimization is target to specific doc value fields that are produced by long field mappers, I experimented with the following query: FROM metrics-hostmetricsreceiver.otel-default | STATS min(@timestamp), max(@timestamp).
The metrics-hostmetricsreceiver.otel-default contains 270 minutes of metrics and has 221184000 docs and storage is 8.5gb. On my local machine, the query time without this change is ~180ms and with this change ~70ms.

Flamegraph without this change:

ESQL profiling of the query (and data_partitioning set to shard) without this change:

{
    "operator": "ValuesSourceReaderOperator[fields = [@timestamp]]",
    "status": {
        "readers_built": {
            "@timestamp:column_at_a_time:BlockDocValuesReader.SingletonLongs": 55
        },
        "values_loaded": 221184000,
        "process_nanos": 1155100354, <--- ~1155ms
        "pages_received": 10150,
        "pages_emitted": 10150,
        "rows_received": 221184000,
        "rows_emitted": 221184000
    }
}

Flamegraph with this change:

ESQL profiling of the query (and data_partitioning set to shard) with this change:

{
    "operator": "ValuesSourceReaderOperator[fields = [@timestamp]]",
    "status": {
        "readers_built": {
            "@timestamp:column_at_a_time:BlockDocValuesReader.BulkSingletonLong": 55
        },
        "values_loaded": 221184000,
        "process_nanos": 218763289, <-- ~218ms
        "pages_received": 10150,
        "pages_emitted": 10150,
        "rows_received": 221184000,
        "rows_emitted": 221184000
    }
}

This is the first of many changes that pushes loading of field values to the es819 doc values codec in case of logsdb/tsdb and when the field supports it. This change first targets reading field values in bulk mode at codec level when doc values type is numeric doc values or sorted doc values, there is only one value per document, and the field is dense (all documents have a value). Multivalued and sparse fields are more complex to support bulk reading for, but it is possible. With this change, the following field types will support bulk read mode at codec level under the described conditions: long, date, geo_point, point and unsigned_long. Other number types like integer, short, double, float, scaled_float will be supported in a followup, but would be similar to long based fields, but required an additional conversion step to either an int or float vector. This change originates from elastic#132460 (which adds bulk reading to `@timestamp`, `_tsid` and dimension fields) and is basically the timestamp support part of it. In another followup, support for single valued, dense sorted (set) doc values will be added for field like _tsid. Relates to elastic#128445

optimize sparse loading.

… case didn't add measurable performance benefits.

elasticsearchmachine · 2025-08-11T01:19:58Z

Pinging @elastic/es-storage-engine (Team:StorageEngine)

elasticsearchmachine · 2025-08-11T01:20:42Z

Pinging @elastic/es-analytical-engine (Team:Analytics)

elasticsearchmachine · 2025-08-11T01:20:43Z

Hi @martijnvg, I've created a changelog YAML for you.

…ngleton_dense_long_loading

martijnvg · 2025-08-11T08:55:00Z

server/src/main/java/org/elasticsearch/index/mapper/BlockLoader.java

+    /**
+     * Specialized builder for collecting dense arrays of long values.
+     */
+    interface SingletonBulkLongBuilder extends Builder {


The plan is to reuse this builder interface for other number field types too and even ordinal based fields. Given that at the codec level everything is stored as long[]. For other non long field types we need a conversion step, but that can happen in the build() method. For example converting to int[] by using Math.exactInt(...), which can be done a simple loop in build() method. So I don't expect us to introduce more interfaces here.

I think we need a SingletonInt instead for ordinals.

I think with generalizing the builder that makes sense.

dnhatn

Wow - more than a 5x speedup, impressive! Great changes; however, I think we should make them less invasive and more contained. Thanks, Martijn! I'm looking forward to seeing this PR merged.

server/src/main/java/org/elasticsearch/index/codec/tsdb/es819/BulkNumericDocValues.java

dnhatn · 2025-08-11T16:44:40Z

server/src/main/java/org/elasticsearch/index/codec/tsdb/es819/ES819TSDBDocValuesProducer.java

+                        bulkReader = new BulkReader() {
+
+                            @Override
+                            public void bulkRead(BlockLoader.SingletonBulkLongBuilder builder, BlockLoader.Docs docs, int offset)


I think it would be more consistent to implement BlockLoader.Block read(BlockFactory factory, Docs docs, int offset) throws IOException instead.

dnhatn · 2025-08-11T16:47:03Z

server/src/main/java/org/elasticsearch/index/mapper/BlockDocValuesReader.java

    }

-    private static class SingletonLongs extends BlockDocValuesReader {
+    static class SingletonLongs extends BlockDocValuesReader {


Can we enable the optimization in BlockLoader.Block read(BlockFactory factory, Docs docs, int offset) of this class only?

public BlockLoader.Block read(BlockFactory factory, Docs docs, int offset) throws IOException { if (numericDocValues instanceof ... r) { return r.read(factory, docs, offset); } ... }

I think that should work as well.

I took me a while, but this works out and it is now even simpler! 054b12e

dnhatn · 2025-08-11T16:48:38Z

server/src/main/java/org/elasticsearch/index/mapper/DateFieldMapper.java

            if (hasDocValues()) {
-                return new BlockDocValuesReader.LongsBlockLoader(name());
+                var indexMode = blContext.indexSettings().getMode();
+                return new BlockDocValuesReader.LongsBlockLoader(name(), indexMode);


Do we need to pass indexMode? I think we can always enable optimizations if the underlying doc_values are dense and use our codec.

Good point we can just check the implementation of numeric doc values. This should be sufficient.

pushed: be0c77c

dnhatn · 2025-08-11T16:50:20Z

server/src/main/java/org/elasticsearch/index/mapper/BlockLoader.java

+    /**
+     * Specialized builder for collecting dense arrays of long values.
+     */
+    interface SingletonBulkLongBuilder extends Builder {


I think we need a SingletonInt instead for ordinals.

dnhatn · 2025-08-11T16:56:29Z

server/src/main/java/org/elasticsearch/index/mapper/BlockLoader.java

        IntBuilder appendInt(int value);
    }

+    /**


Can we rename this to SingletonLongBuilder and add support for appending a single long? I think we can use this builder when doc_values is dense, even if it's not from our codec. Also, we should consider extending LongVectorFixedBuilder to support bulking, but it's not an issue of this PR.

Renamed the class: 2924c402368fd58bc13adea5a943d8afa2fda963

I think we can use this builder when doc_values is dense, even if it's not from our codec.

I think so too, we would need to check by: numericDocValue#cost() == maxDoc in BlockDocValuesReader.SingletonLongs?

Also, we should consider extending LongVectorFixedBuilder to support bulking, but it's not an issue of this PR.

👍

I pushed f097f4a, to use singleton long builder when we're dense even when not using es819 doc value codec.

…n the codec.

…dec and field is dense.

…ngleton_dense_long_loading

dnhatn

I've left more comments, but we're close. Thanks, Martijn!

dnhatn · 2025-08-12T05:29:52Z

server/src/main/java/org/elasticsearch/index/mapper/BlockDocValuesReader.java

-                    if (numericDocValues.advanceExact(doc)) {
+            if (numericDocValues instanceof BulkNumericDocValues bulkDv) {
+                return bulkDv.read(factory, docs, offset);
+            } else if (isDense) {


I think it's unsafe to use cost for this. Would you mind reverting this part? We can find a way to enable it later. We need to do something similar to FieldExistsQuery#rewrite.

Ok, I will revert.

I see that FieldExistsQuery#rewrite(...) relies on Terms#getDocCount() and for that we need an inverted index for that same field. I think doc value skippers can also be used. But let's figure this out in another change.

dnhatn · 2025-08-12T05:40:21Z

server/src/main/java/org/elasticsearch/index/codec/tsdb/es819/ES819TSDBDocValuesProducer.java

+                            int remainingBlockLength = ES819TSDBDocValuesFormat.NUMERIC_BLOCK_SIZE - blockInIndex;
+                            for (int newLength = remainingBlockLength; newLength > 1; newLength = newLength >> 1) {
+                                int lastIndex = i + newLength - 1;
+                                if (lastIndex < docsCount && isDense(index, docs.get(lastIndex), newLength)) {


I like this logic! Can we limit remainingBlockLength to the min of (ES819TSDBDocValuesFormat.NUMERIC_BLOCK_SIZE - blockInIndex, docsCount - i) to allow a single copy of the last block? Note that there could be an issue with this logic for Lookup Join and Enrich, as the same doc IDs can appear multiple times. For example, this logic might mistakenly treat [1, 1, 2, 4] as [1, 2, 3, 4]. However, both Lookup and Enrich indices don't use this codec.

Can we limit remainingBlockLength to the min of (ES819TSDBDocValuesFormat.NUMERIC_BLOCK_SIZE - blockInIndex, docsCount - i) to allow a single copy of the last block?

Let me try this.

Note that there could be an issue with this logic for Lookup Join and Enrich, as the same doc IDs can appear multiple times. For example, this logic might mistakenly treat [1, 1, 2, 4] as [1, 2, 3, 4]. However, both Lookup and Enrich indices don't use this codec.

I will add a comment about this here.

pushed: 6ca5c66

I think we can remove lastIndex < docsCount check?

dnhatn · 2025-08-12T05:42:41Z

test/framework/src/main/java/org/elasticsearch/index/mapper/TestBlock.java

+                    public BlockLoader.SingletonLongBuilder appendLongs(long[] newValues, int from, int length) {
+                        try {
+                            System.arraycopy(newValues, from, values, count, length);
+                        } catch (ArrayIndexOutOfBoundsException e) {


yes, for easy debugging :)

dnhatn · 2025-08-12T05:42:53Z

test/framework/src/main/java/org/elasticsearch/index/mapper/TestBlock.java

-                return docs[i];
+                try {
+                    return docs[i];
+                } catch (ArrayIndexOutOfBoundsException e) {


…value codec and field is dense." This reverts commit f097f4a.

dnhatn

LGTM. Thanks for all iterations @martijnvg!

dnhatn · 2025-08-12T06:12:29Z

docs/changelog/132622.yaml

@@ -0,0 +1,5 @@
+pr: 132622
+summary: Add bulk loading of dense singleton number doc values to tsdb codec and push compute engine value loading for longs down to tsdb codec


I think the summary doesn't match with the PR title?

… been improved.

martijnvg added 2 commits August 9, 2025 22:16

Remove SingletonBulkLongBuilder#appendLong(...) method and

d8ea150

optimize sparse loading.

martijnvg added the :StorageEngine/Mapping The storage related side of mappings label Aug 10, 2025

elasticsearchmachine added the v9.2.0 label Aug 10, 2025

elasticsearchmachine and others added 3 commits August 10, 2025 06:49

[CI] Auto commit changes from spotless

0d73a27

Simplified es819 bulk read logic, as the complexity for to full dense…

b32cf93

… case didn't add measurable performance benefits.

adjust benchmark

173f113

martijnvg requested a review from dnhatn August 11, 2025 01:19

martijnvg marked this pull request as ready for review August 11, 2025 01:19

elasticsearchmachine added the Team:StorageEngine label Aug 11, 2025

martijnvg added :Analytics/Compute Engine Analytics in ES|QL :StorageEngine/Codec >enhancement and removed Team:StorageEngine labels Aug 11, 2025

elasticsearchmachine added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:StorageEngine labels Aug 11, 2025

Update docs/changelog/132622.yaml

5d84f19

martijnvg added 2 commits August 11, 2025 08:26

Merge remote-tracking branch 'es/main' into compute_engine_improve_si…

e6014ec

…ngleton_dense_long_loading

fixed changelog

150e06a

martijnvg mentioned this pull request Aug 11, 2025

Bulk doc value loading at codec level #128445

Closed

6 tasks

martijnvg requested a review from nik9000 August 11, 2025 08:49

martijnvg commented Aug 11, 2025

View reviewed changes

dnhatn reviewed Aug 11, 2025

View reviewed changes

martijnvg and others added 4 commits August 12, 2025 10:26

Remove BulkReader interface and use BlockLoader.ColumnAtATimeReader i…

ffd2de6

…n the codec.

No need to pass down indexMode

be0c77c

[CI] Auto commit changes from spotless

a5c877a

Renamed SingletonBulkLongsBuilder to SingletonLongBuilder

1c15d39

martijnvg added 3 commits August 12, 2025 11:43

Simplify BulkNumericDocValues and LongsBlockLoader even more.

054b12e

Use SingletonLongBuilder also when we're not using es819 doc value co…

f097f4a

…dec and field is dense.

Merge remote-tracking branch 'es/main' into compute_engine_improve_si…

c26e575

…ngleton_dense_long_loading

martijnvg requested a review from dnhatn August 12, 2025 05:00

martijnvg added 2 commits August 12, 2025 12:02

iter

cc3614b

iter2

8636124

dnhatn reviewed Aug 12, 2025

View reviewed changes

dnhatn self-requested a review August 12, 2025 05:48

martijnvg added 3 commits August 12, 2025 13:01

Revert "Use SingletonLongBuilder also when we're not using es819 doc …

212ec5d

…value codec and field is dense." This reverts commit f097f4a.

improve computing remainingBlockLength

6ca5c66

remove unneeded catch clauses.

2ef68f4

dnhatn approved these changes Aug 12, 2025

View reviewed changes

martijnvg added 4 commits August 12, 2025 13:29

remove unneeded condition now that computing remainingBlockLength has…

115e4e6

… been improved.

iter summary

dcbcaf2

remove unused field

5e207e1

iter changelog

206703b

martijnvg enabled auto-merge (squash) August 12, 2025 07:21

martijnvg merged commit 66107f1 into elastic:main Aug 12, 2025
33 checks passed

		@@ -0,0 +1,5 @@
		pr: 132622
		summary: Add bulk loading of dense singleton number doc values to tsdb codec and push compute engine value loading for longs down to tsdb codec

Push compute engine value loading for longs down to tsdb codec. #132622

Push compute engine value loading for longs down to tsdb codec. #132622

Uh oh!

Conversation

martijnvg commented Aug 10, 2025

Uh oh!

elasticsearchmachine commented Aug 11, 2025

Uh oh!

elasticsearchmachine commented Aug 11, 2025

Uh oh!

elasticsearchmachine commented Aug 11, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dnhatn left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

martijnvg Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dnhatn left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dnhatn left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

martijnvg Aug 12, 2025 •

edited

Loading